Router-Level Spam Filtering Using TCP Fingerprints: Architecture and Measurement-Based Evaluation
نویسندگان
چکیده
Email spam has become costly and difficult to manage in recent years. Many of the mechanisms used for controlling spam are located at local SMTP servers and end-host machines. These mechanisms can place a significant burden on mail servers and end-host machines as the number spam messages received continues to increase. We propose a preliminary architecture that applies spam detection filtering at the router-level using light-weight signatures for spam senders. We argue for using TCP headers to develop fingerprint signatures that can be used to identify spamming hosts based on the specific operating system and version from which the email is sent. These signatures are easy to compute in a light-weight, stateless fashion. More importantly, only a small amount of fast router memory is needed to store the signatures that contribute a significant portion of spam. We present simple heuristics and architectural enhancements for selecting signatures which result in a negligible false positive rate. We evaluate the effectiveness of our approach on data sets collected at two different vantage points simultaneously, the University of Wisconsin-Madison and a corporation in Tokyo, Japan over a one month period. We find that by targeting 100 fingerprint signatures, we can reduce the amount of received spam by 28-59% with false positive ratio less than 0.05%. Thus, our router-level approach works effectively to decrease the workload of subsequent anti-spam filtering mechanisms, such as, DNSBL look up, and content filtering. Our study also leverages the AS numbers of spam senders to discover the origin of the majority of spam seen in our data sets. This information allows us to pin-point effective network locations to place our router-level spam filters to stop spam close to the source. As a byproduct of our study, the extracted TCP fingerprints reveal signatures which originate all over the world but only send spam indicating the potential existence of global-scale spamming infrastructures.
منابع مشابه
A Machine Learning Approach to Server-side
Spam-detection systems based on traditional methods have several obvious disadvantages like low detection rate, necessity of regular knowledge bases’ updates, impersonal filtering rules. New intelligent methods for spam detection, which use statistical and machine learning algorithms, solve these problems successfully. But these methods are not widespread in spam filtering for enterprise-level ...
متن کاملA Distributed Mechanism for Identification and Discrimination of Non-TCP-friendly Flows in the Internet
This paper proposes the MUV (Misbehaving User Vanguard) algorithm for identification and discrimination of non-TCP-friendly best-effort flows. The operational principle of MUV is to detect non-TCP-friendly flows at the ingress-router by comparing arrival rates to equivalent TCP-friendly rates, i.e. the arrival rate of a TCP flow having the same round-trip-time and packet-loss probability. If a ...
متن کاملEnterprise Anti-Spam Solution Based on Machine Learning Approach
Spam-detection systems based on traditional methods have several obvious disadvantages like low detection rate, necessity of regular knowledge bases’ updates, impersonal filtering rules. New intelligent methods for spam detection, which use statistical and machine learning algorithms, solve these problems successfully. But these methods are not widespread in spam filtering for enterprise-level ...
متن کاملPerformance Measurement Tool for Packet Forwarding Devices
A tool built from primarily public domain software components is presented to evaluate the performance of packet forwarding devices used in modern TCP/IP based computer networks. The tool makes possible to specify complex traffic patterns based on a high level definition, then to execute the designed measurements and collect data, to process the results of measurements, and finally to generate ...
متن کاملPersonalised, Collaborative Spam Filtering
The state of the art sees content-based filters tending towards collaborative filters, whereby email is filtered at the MTA with users feeding information back about false positives and negatives. While this improves the ability of the filter to track concept drift in spam over time, such approaches make assumptions implicit in centralised spam filtering, such as that all users consider the sam...
متن کامل